The Internet and other computer networks have become the backbone of information transfer. In that regard, effective analysis and searching of associated data stores is paramount. A great many applications of computer technology would be better enabled and enjoy enhanced robustness and completeness if it were possible for automatic machines to process the meaning contained in spoken or written natural language. Previously, such interpretations were derived either from the lexical items occurring in the documents themselves, or from a statistical model derived from the corpus in which the documents appear, a larger corpus of documents, or both.
A novel system and method is accordingly disclosed for producing semantically rich representations of texts that exploits semantic models to amplify and sharpen the interpretations of texts. This method is applicable not only for producing semantic representations of texts, but also for matching the representations of multiple texts. The method relies on the fact that there is a substantial amount of semantic content associated with most text strings that is not explicit in those strings, or in the mere statistical co-occurrence of the strings with other strings, but which is nevertheless extremely relevant to the text.
This additional information may be used to sharpen the representations derived directly from the text string, and also to augment the representation with content that, while not explicitly mentioned in the string, is implicit in the text and, if made explicit, can be used to support the performance of text processing applications including document indexing and retrieval, document classification, document routing, document summarization, and document tagging. These enhancements also support down-stream processing, such as automated document reading and understanding, online advertising placement, electronic commerce, corporate knowledge management, and business and government intelligence applications.
While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.